Abstract: In the era of big data, many users and companies start to move their data to cloud storage to simplify data man-agement and reduce data maintenance cost. HoIver, security and privacy issues become major concerns because third-partycloud service providers are not always trusty. Although data contents can be protected by encryption, the access patterns that contain important information are still exposed to clouds or malicious attackers. In this paper, I apply the ORAM algorithm to enable privacy-preserving access to big data that are deployed in distributed file systems built upon hundreds or thousands of servers in a single or multiple geo-distributed cloud sites. Since the ORAM algorithm would lead to serious access load unbalance among storage servers, I study a data placement problem to achieve a load balanced storage system with improved availability and responsiveness. Due to the NP-hardness of this problem, I propose a low-complexity algorithm that can deal with large-scale problem size with respect to big data. Extensive simulations are conducted to show that my proposed algorithm finds results close to the optimal solution, and significantly outperforms a random data placement algorithm.

Keywords: ORAM algorithm, big data, NP-hardness, random data placement algorithm.